Sampling from around the world
Sampling from around the world
nishio At a recent social event, in response to the opinion that "if LLMs learn the output of LLMs, their performance will decrease," I thought "no. It gets smarter even if it learns from its own data like AlphaGo", but I thought both opinions were a bit off. nishio First of all, "everyone puts GPT-generated text on the net, and future LLMs will think it is human input and learn it," and "that's why performance will degrade. As for the claim, the premise is not valid because it can be identified as LLM-generated text and filtered. nishio As for "self-generated data like AlphaGo", the "world" of "Go board and win/loss" is an executable description as a program If LLM learns self-generated text, it is not a sampling of the world. nishio "A world where text and image input comes in, and when you react to it, another input comes in" has been created, and 100 million active users are using it. ChatGPT. One of the ways to create "eventually there will be no more data on the Web that can be given to LLMs" and "where the data is created will be important." nishio Programming is a "defined world" where "when you write code and run it, you get results", so like AlphaGo, "world sampling" can be self-generated. nishio LLM's programming ability is easier to grow than other abilities. The probability that the code that comes out will work in one shot will increase more and more in the future, so it will be more productive compared to humans typing in every single character and typo. [This will lead to the elimination of the style of programming in which humans input everything directly. ---
This page is auto-translated from /nishio/世界からのサンプリング using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.